130 research outputs found
Efficient successor retrieval operations for aggregate query processing on clustered road networks
Cataloged from PDF version of article.Get-Successors (GS) which retrieves all successors of a junction is a kernel operation used to facilitate aggregate computations in road network queries. Efficient implementation of the GS operation is crucial since the disk access cost of this operation constitutes a considerable portion of the total query processing cost. Firstly, we propose a new successor retrieval operation Get-Unevaluated-Successors (GUS), which retrieves only the unevaluated successors of a given junction. The GUS operation is an efficient implementation of the GS operation, where the candidate successors to be retrieved are pruned according to the properties and state of the algorithm. Secondly, we propose a hypergraph-based model for clustering successively retrieved junctions by the GUS operations to the same pages. The proposed model utilizes query logs to correctly capture the disk access cost of GUS operations. The proposed GUS operation and associated clustering model are evaluated for two different instances of GUS operations which typically arise in Dijkstra's single source shortest path algorithm and incremental network expansion framework. Our simulation results show that the proposed successor retrieval operation together with the proposed clustering hypergraph model is quite effective in reducing the number of disk accesses in query processing. (C) 2010 Published by Elsevier Inc
Simultaneous input and output matrix partitioning for outer-product-parallel sparse matrix-matrix multiplication
Cataloged from PDF version of article.FFor outer-product-parallel sparse matrix-matrix multiplication (SpGEMM) of the form C=A×B, we propose three hypergraph models that achieve simultaneous partitioning of input and output matrices without any replication of input data. All three hypergraph models perform conformable one-dimensional (1D) columnwise and 1D rowwise partitioning of the input matrices A and B, respectively. The first hypergraph model performs two-dimensional (2D) nonzero-based partitioning of the output matrix, whereas the second and third models perform 1D rowwise and 1D columnwise partitioning of the output matrix, respectively. This partitioning scheme induces a two-phase parallel SpGEMM algorithm, where communication-free local SpGEMM computations constitute the first phase and the multiple single-node-accumulation operations on the local SpGEMM results constitute the second phase. In these models, the two partitioning constraints defined on weights of vertices encode balancing computational loads of processors during the two separate phases of the parallel SpGEMM algorithm. The partitioning objective of minimizing the cutsize defined over the cut nets encodes minimizing the total volume of communication that will occur during the second phase of the parallel SpGEMM algorithm. An MPI-based parallel SpGEMM library is developed to verify the validity of our models in practice. Parallel runs of the library for a wide range of realistic SpGEMM instances on two large-scale parallel systems JUQUEEN (an IBM BlueGene/Q system) and SuperMUC (an Intel-based cluster) show that the proposed hypergraph models attain high speedup values. © 2014 Society for Industrial and Applied Mathematics
Efficient fast hartley transform algorithms for hypercube-connected multicomputers
Cataloged from PDF version of article.Although fast Hartley transform (FHT) provides
efficient spectral analysis of real discrete signals, the literature
that addresses the parallelization of FHT is extremely rare. FHT
is a real transformation and does not necessitate any complex
arithmetics. On the other hand, FHT algorithm has an irregular
computational structure which makes efficient parallelization
harder. In this paper, we propose a efficient restructuring for the
sequential FHT algorithm which brings regularity and symmetry
to the computational structure of the FHT. Then, we propose
an efficient parallel FHT algorithm for medium-to-coarse grain
hypercube multicomputers by introducing a dynamic mapping
scheme for the restructured FHT. The proposed parallel algorithm
achieves perfect load-balance, minimizes both the number
and volume of concurrent communications, allows only nearestneighbor
communications and achieves in-place computation and
communication. The proposed algorithm is implemented on a 32-
node iPSC12' hypercube multicomputer. High-efficiency values
are obtained even for small size FHT problems
Performance of query processing implementations in ranking-based text retrieval systems using inverted indices
Cataloged from PDF version of article.Similarity calculations and document ranking form the computationally expensive parts of query processing in ranking-based text retrieval. In this work, for these calculations, 11 alternative implementation techniques are presented under four different categories, and their asymptotic time and space complexities are investigated. To our knowledge, six of these techniques are not discussed in any other publication before. Furthermore, analytical experiments are carried out on a 30 GB document collection to evaluate the practical performance of different implementations in terms of query processing time and space consumption. Advantages and disadvantages of each technique are illustrated under different querying scenarios, and several experiments that investigate the scalability of the implementations are presented. (C) 2005 Elsevier Ltd. All rights reserved
Parallel Frequent Item Set Mining with Selective Item Replication
Cataloged from PDF version of article.We introduce a transaction database distribution scheme that divides the frequent item set mining task in a top-down
fashion. Our method operates on a graph where vertices correspond to frequent items and edges correspond to frequent item sets of
size two. We show that partitioning this graph by a vertex separator is sufficient to decide a distribution of the items such that the
subdatabases determined by the item distribution can be mined independently. This distribution entails an amount of data replication,
which may be reduced by setting appropriate weights to vertices. The data distribution scheme is used in the design of two new parallel
frequent item set mining algorithms. Both algorithms replicate the items that correspond to the separator. NoClique replicates the work
induced by the separator and NoClique2 computes the same work collectively. Computational load balancing and minimization of
redundant or collective work may be achieved by assigning appropriate load estimates to vertices. The experiments show favorable
speedups on a system with small-to-medium number of processors for synthetic and real-world databases
A parallel progressive radiosity algorithm based on patch data circulation
Cataloged from PDF version of article.Current research on radiosity has concentrated on increasing the accuracy and the speed of the solution. Although algorithmic and meshing techniques decrease the execution time, still excessive computational power is required for complex scenes. Hence, parallelism can be exploited for speeding up the method further. This paper aims at providing a thorough examination of parallelism in the basic progressive refinement radiosity, and investigates its parallelization on distributed-memory parallel architectures. A synchronous scheme, based on static task assignment, is proposed to achieve better coherence for shooting patch selections. An efficient global circulation scheme is proposed for the parallel light distribution computations, which reduces the total volume of concurrent communication by an asymptotical factor. The proposed parallel algorithm is implemented on an Intel's iPSC/2 hypercube multicomputer. Load balance qualities of the proposed static assignment schemes are evaluated experimentally. The effect of coherence in the parallel light distribution computations on the shooting patch selection sequence is also investigated. Theoretical and experimental evaluation is also presented to verify that the proposed parallelization scheme yields equally good performance on multicomputers implementing the simplest (e.g. ring) as well as the richest (e.g. hypercube) interconnection topologies. This paper also proposes and presents a parallel load re-balancing scheme which enhances our basic parallel radiosity algorithm to be usable in the parallelization of radiosity methods adopting adaptive subdivision and meshing techniques. (C) 1996 Elsevier Science Lt
Algorithms for efficient vectorization of repeated sparse power system network computations
Cataloged from PDF version of article.Standard sparsity-based algorithms used in power system
appllcations need to be restructured for efficient vectorization
due to the extremely short vectors processed. Further, intrinsic
architectural features of vector computers such as chaining and
sectioning should also be exploited for utmost performance. This
paper presents novel data storage schemes and vectorization alsorim
that resolve the recurrence problem, exploit chaining and
minimize the number of indirect element selections in the repeated
solution of sparse linear system of equations widely encountered
in various power system problems. The proposed schemes are
also applied and experimented for the vectorization of power mismatch
calculations arising in the solution phase of FDLF which involves
typical repeated sparse power network computations. The
relative performances of the proposed and existing vectorization
schemes are evaluated, both theoretically and experimentally on
IBM 3090ArF.Standard sparsity-based algorithms used in power system appllcations need to be restructured for efficient vectorization
due to the extremely short vectors processed. Further, intrinsic architectural features of vector computers such as chaining and sectioning should also be exploited for utmost performance. This paper presents novel data storage schemes and vectorization alsorim that resolve the recurrence problem, exploit chaining and minimize the number of indirect element selections in the repeated solution of sparse linear system of equations widely encountered in various power system problems. The proposed schemes are also applied and experimented for the vectorization of power mismatch calculations arising in the solution phase of FDLF which involves typical repeated sparse power network computations. The relative performances of the proposed and existing vectorization schemes are evaluated, both theoretically and experimentally on IBM 3090ArF
A parallel framework for in-memory construction of term-partitioned inverted indexes
Cataloged from PDF version of article.With the advances in cloud computing and huge RAMs provided by 64-bit architectures, it is possible to tackle large problems using memory-based solutions. Construction of term-based, partitioned, parallel inverted indexes is a communication intensive task and suitable for memory-based modeling. In this paper, we provide an efficient parallel framework for in-memory construction of term-based partitioned, inverted indexes. We show that, by utilizing an efficient bucketing scheme, we can eliminate the need for the generation of a global vocabulary. We propose and investigate assignment schemes that can reduce the communication overheads while minimizing the storage and final query processing imbalance. We also present a study on how communication among processors should be carried out with limited communication memory in order to reduce the total inversion time. We present several different communication-memory organizations and discuss their advantages and shortcomings. The conducted experiments indicate promising results. © 2012 The Author. Published by Oxford University Press on behalf of The British Computer Society
Active node determination for correlated data gathering in wireless sensor networks
Cataloged from PDF version of article.In wireless sensor network applications where data gathered by different sensor nodes is correlated, not all sensor nodes need to be active for the wireless sensor network to be functional. Given that the sensor nodes that are selected as active form a connected wireless network, the inactive sensor nodes can be turned off. Allowing some sensor nodes to be active and some sensor nodes inactive interchangably during the lifecycle of the application helps the wireless sensor network to have a longer lifetime. The problem of determining a set of active sensor nodes in a correlated data environment for a fully operational wireless sensor network can be formulated as an instance of the connected correlation-dominating set problem. In this work, our contribution is twofold; we propose an effective and runtime-efficient iterative improvement heuristic to solve the active sensor node determination problem, and a benefit function that aims to minimize the number of active sensor nodes while maximizing the residual energy levels of the selected active sensor nodes. Extensive simulations we performed show that the proposed approach achieves a good performance in terms of both network lifetime and runtime efficiency. © 2012 Elsevier B.V. All rights reserved
- …